MDPs

Over the next several videos, you'll learn all about how to rigorously define a reinforcement learning problem as a Markov Decision Process (MDP) .

Towards this goal, we'll begin with an example!

MDPs, Part 1

## Notes

In general, the state space \mathcal{S} is the set of all nonterminal states .

In continuing tasks (like the recycling task detailed in the video), this is equivalent to the set of all states .

In episodic tasks, we use \mathcal{S}^+ to refer to the set of all states, including terminal states .

The action space \mathcal{A} is the set of possible actions available to the agent.

In the event that there are some states where only a subset of the actions are available, we can also use \mathcal{A}(s) to refer to the set of actions available in state s\in\mathcal{S} .